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Abstract —The new frontier in cellular networks is harnessing 
the enormous spectrum available at millimeter wave (mmWave) 
frequencies above 28 GHz. The challenging radio propagation 
characteristics at these frequencies, and the use of highly di¬ 
rectional beamforming, lead to intermittent links between the 
base station (BS) and the user equipment (UE). In this paper, we 
revisit the problem of cell selection to maintain an acceptable level 
of service, despite the underlying intermittent link connectivity 
typical of mm Wave links. We propose a Markov Decision Process 
(MDP) framework to study the properties and performance 
of our proposed cell selection strategy, which jointly considers 
several factors such as dynamic channel load and link quality. 
We use the Value Iteration Algorithm (VIA) to solve the MDP, 
and obtain the optimal set of associations. We address the 
multi user problem through a distributed iterative approach, in 
which each UE characterizes the evolution of the system based 
on stationary channel distribution and cell selection statistics 
of other UEs. Through simulation results, we show that our 
proposed technique makes judicious handoff choices, thereby 
providing a significant improvement in the overall network 
capacity. Further, our technique reduces the total number of 
handoffs, thus lowering the signaling overhead, while providing 
a higher quality of service to the UEs. 

Index Terms —Cellular; mmWave; 5G; Handover; MDP. 

I. Introduction 

Cell selection is a fundamental functionality in wireless 
networks, and involves choosing which base station (BS) the 
user equipment (UE) should be connected to. Current cellular 
networks that operate in the microwave bands (around 2 
GHz) use simple heuristics to perform cell selection, usually 
choosing the BS that provides the highest long-term signal to 
noise ratio (SNR) |[T]. 

In this paper, we revisit the cell selection problem in the 
context of next-generation cellular networks. These networks 
are expected to use millimeter wave (mmWave) technology, 
that operates at frequencies above 28 GHz, thereby exploiting 
the enormous amount of spectrum available in these bands. 
At these frequencies, the radio propagation characteristics 
are starkly different from their microwave counterparts. First, 
according to the Friis transmission equation El, the path loss 
can easily exhibit 30-40 dB more attenuation. This higher 
path loss necessitates the use of fairly narrow and very 
directional beams, that can be realized through phased antenna 
arrays, whose implementation is made possible thanks to 
the smaller wavelengths that correspond to these frequencies. 
Furthermore, due to the exacerbated blockage and shadowing 
effects 0, the wireless links will exhibit rapid variations 


in quality, thereby leading to severe intermittency in link 
connectivity between the UE and the BS. 

To address these challenges, and in particular to maintain 
an acceptable level of service despite this intermittency, the 
density of BSs in mmWave cellular networks is expected to be 
an order of magnitude higher than in current systems a. The 
UEs will track several BSs simultaneously and rapidly switch 
between them in response to the fast-varying link qualities 0. 
A simple approach to cell selection would be for each UE to 
greedily pick the BS that is instantaneously the best, thereby 
attempting to keep an optimal state for itself. Unfortunately, 
these approaches may lead to degraded performance for other 
UEs 0 , 0 , or even to instability. In addition, they entail sig¬ 
nificant overhead because of frequent signaling in the control 
plane due to the large number of BS handovers that would 
result under such a policy. Therefore, a better approach would 
need to consider the network behavior and to look for solutions 
where all relevant information (including channel conditions 
and BS load) is explicitly included in the optimization. 

The problem of cell selection in mmWave networks, as 
well as in macrocellular networks with high mobility 0, 
has received considerable attention over the past few years. 
In 0, Talukdar et al. conclude that in mmWave the UE 
will remain associated with a BS for just a few seconds 
and, in some cases, for as little as 0.75 s. In 0 , Shokri- 
Ghadikolaei et al. study the implications of the mmWave PHY 
on the MAC layer and argue that, if simple cell selection 
techniques based on SNR were used, the handovers would 
become too frequent. Further, loss of channel information and 
outdated beamforming vectors will also lead to more frequent 
outages and expensive cell discovery searches. The impact of 
network load on cell selection in dense pico-cell environments 
was studied by Ye et al. in 0. They argue that considering 
SINR alone leads to sub-optimal assignments, and the optimal 
approach is therefore to solve cell selection and resource 
allocation jointly. 

One of the most popular techniques to study the problem 
of cell selection as well as possible handoff strategies is 
through Markov Decision Processes (MDPs) lITOl . ifTTI . ifT^ . 
which provide a useful mathematical framework for studying 
the properties and performance of proposed cell selection 
strategies. However, in order to be useful, MDPs need to be 
carefully applied. In particular, it is important that all factors 
that play a key role in determining the goodness of a cell 
selection solution be included in the model. 


Previous studies that only use partial information may lead 
to sub-optimal results. For example, Dang et al. in HD 
and Pan et al. in na do not consider dynamic variations 
in the network load to be an input to the cell selection 
algorithm. Furthermore, HD considers a network with just 
one UE, and therefore does not capture the global network- 
level performance effects of the proposed technique. In ifTOl 
Stevens-Navarro et al. consider a network with just one BS, 
but with multiple relay nodes. 

A more comprehensive study of the cell association problem 
will need to (i) consider a network with multiple BSs, where 
the network load can vary dynamically, and (ii) explicitly 
include in the optimization problem the key parameters that 
affect the performance (at both the network and the user level) 
in a multi-cell multi-user scenario, including cell load and 
channel conditions. Our contribution in this paper is to develop 
such an approach and compare its performance to schemes that 
use only partial information. 

The paper is organized as follows. In Sec. |II] we introduce 
our model formulation. In Sec. imi we describe our decision 
algorithm and the iterative process applied to solve the multi¬ 
agent nature of the problem. In Sec. El we discuss some 
simulation results. In Sec.|V] we analytically derive the num¬ 
ber of states for varying configurations. Finally, we conclude 
the paper and propose some future work in Sec. |Vll 

II. Model formulation 

In our cell association problem, each UE can connect to a 
set of L surrounding BSs. The time evolution of the quality 
of each link is described by a Markov process with K states. 
N represents the total number of UEs in the system. As a 
first step towards more general scenarios, we consider the 
situation in which all links have the same statistics. This can 
be justified in mm Wave scenarios where the channel can be 
assumed to alternate between well-defined states (e.g., line- 
of-sight (LOS) and non-LOS) and all LOS links (non-LOS 
links) can be considered to be equally good (equally bad) on 
average. More general models where different statistics may 
be associated to different links will be left for future work. 

Although the framework is general, we will illustrate the 
methodology using a simplified version of the mmWave 
channel model described in HD- In this model, each link is 
characterized by three possible states: 

• outage, where no mm Wave link is available; 

• LOS, where a direct LOS mmWave link is available; 

• NLOS, where only a non-LOS mmWave link is available. 

The presence of the outage state, which occurs due to 

blockage, and the highly dynamic behavior of the channel, 
which can move in and out of the outage state on a very short 
time scale, are unique to the mmWave model. In practice, 
the rate that a mobile experiences in any state depends on 
several conditions, including interference and SNR. However, 
to simplify the study, we will assume that the rate is uniquely 
determined by the state. A more complex model, which 
includes the SNR or other variations within each state, can 
also be included. The transmission rates in the LOS and 


NLOS states are based approximately on the average spectral 
efficiencies in those states, as presented in HD- 

More specifically, the statistical model provided in HD 
gives the probability that a UE is in each of the three states 
based on its distance from the BS. Assuming the UE is 
randomly dropped in each cell with a radius of 200 m (a typical 
cell radius in the mmWave range), we computed the steady- 
state probability of each state, tt. Hence, via quadratic pro¬ 
gramming optimization, we obtain matrix P such that diag(P) 
is close to 1 - where Tavg = [tLOS,tNLOS,tout] is the 
average time spent in each state before leaving it. Also, this 
matrix is consistent with the steady-state equation, ttP = tt. 

Lollowing this approach, we model the channel conditions 
seen by each UE towards the L BSs as NL i.i.d. Markov 
processes with common transition probability matrix. 

Pout—out Pout —NLOS Pout —LOS 

^ “ PNLOS-out PNLOS-NLOS PNLOS-LOS ' 

PLOS-out PLOS-NLOS PLOS-LOS 

Each connection is defined as 

C^ = {K,,U,}, ( 2 ) 

where Ki and Ui represent the AT—quantized channel state 
characterizing the link between a generic UE, k, and BS i and 
the number of UEs connected to BS i, respectively. 

The state space is defined as a subset of 

5 = X (3) 

constrained by 

L 

Y,U, = N, (4) 

i=l 

and where 

X ... X C^]. (5) 

In a state, describes the primary connection, i.e., the 
BS serving UE k, and represents the characterization 

of the L — 1 surrounding links (between UE k and the 
non-serving BSs). The state space S contains all possible 
combinations of channel conditions towards the different BSs 
and load occupancy at each BS. Due to the symmetry of the 
system model, a single state can represent multiple situations 
(according to all possible permutations of a given scenario), 
which leads to a significant reduction of the number of states 
needed to represent the possible system configurations, and 
therefore to a better scalability of the model. To assess the 
complexity of these models, we analytically derive the number 
of those states in Section |V] 

The action is defined as the identiher of the cell that the 
UE will join at the next step, i.e., 

a G As, As = {1,..., A}, 


(6) 




Algorithm 1 MDP-based Handover 

1: Intialization: Initial Policy G 

2: For each iteration (n > 0) 

Select a UE: k = mod{N, n) + 1 
Update the Policies: 

For UE k 

p{sj\si,a) = 

Vfl 

= VIA(p,r) 

For other UEs to G {1,, 2,..., N} \ k 

'T\m _ 

— ^n-l 

n = n + \ 

Until convergence 


which corresponds to a handover if a is different from the 
current serving cell. Here, As represents the set of all BSs. 

In an MDP, the statistics of the next state depends only 
on the current state and on the decision made. Therefore, we 
need to define the transition probabilities, p{sj\si,a), i.e., the 
probabilities of moving to state Sj given the current state 
Si and under action a, where Si,Sj G S. The transition 
probabilities must satisfy the condition 

^ p(sj |si,a) = l,Vsi G 5,0 G As. (7) 

Bj £S 

Our link reward function r{si,a) depends on the average 
reward over all possible destinations. Thus, we let rt{si, a, sj) 
denote the value at time t of the instantaneous reward received 
given that the state of the system at decision epoch t is Si, 
action o G is selected, and the system is in state Sj at 
decision epoch t + 1. Its expected value at decision epoch t 
can be evaluated by computing 

r(sj,a)= rt{si,a,sj)pt{sj\s^,a), (8) 

SjGS 

where 

R 

rt{si,a,Sj) = (1 - c{s„Sj,a)) ^ ^ (9) 

Rs is the achievable rate the UE would enjoy if it were the 

only UE in its cell, and only depends on the channel quality 
R 

of state Sj, whereas is the rate actually available when 

the cell load Ua on the target BS a is taken into accountQ 
c{si,Sja) is the handover cost function, which is equal to 
OH if the UE moves from the associated BS in state Si to a 
different BS in state Sj, and is otherwise equal to zero. We 
define the value of OH as a percentage of the spectrum that 
needs to be used for signaling. 

'Note that in this formulation the cell load does not include the incoming 
user (which instead accounts for the ”+1” in the denominator). Also, (7a + l is 
an estimate based on the status of the BS occupancy in the previous slot, and 
does not necessarily represent the true state, which depends on the decisions 
being made in the cuiTent slot (i.e., other users leaving or joining that BS). 


Algorithm 2 Value Iteration Algorithm (VIA(p, r)) 


1: 

Select G V, 

specify 

e, w > 0, and set n = 0 


2: 

For each Si G 

S, compute by 



vO 

+1) (5^) = ma 

XaeAs 1 ^ 

(si,a) + Es,-es‘^ 

3: 

if 

^u+l _ 

< e(l 

- a;)/2w then 




goto Step 

6 



4: 

else 






n ■<— n -b 1 





goto step 

2 



5: 

end if 




6: 

For each Si, choose 





Si) = argmax^g^^ |r 

{Si,a) 



III. MDP-based handover decision 

In this section, we describe our algorithm to obtain the op¬ 
timal cell selection strategy for each UE. We use a distributed 
iterative approach in which each UE finds its optimal determin¬ 
istic policy when assuming that all other UEs make handover 
decisions based on current BS occupancy but assuming steady- 
state channel conditions, which results in an approximated cell 
occupancy evolutioifl 

The proposed algorithm is described in Algorithm [T] 
It initializes the system with a random policy assign¬ 
ment to each UE, G {1, 2,... , TV}, where 27}^ = 

(d"(si), (i”(s 2 ), • • • , • • •) contains the set of actions 

for all states Sj G S for UE i at iteration n. This algorithm 
runs for multiple iterations. In each iteration, a UE k is selected 
sequentially among the given set of UEs. For the selected UE, 
the policy is updated based on the policies of the other UEs 
at the previous iteration, denoted by We introduce the 

following definitions. 

s{t) = [{Ki{t),Ui{t)), - ■ ■ ,iKL{t),ULm (10) 

is the state seen by the selected UE, i.e., the UE that is 

updating its policy. On the other hand, 

S*(f) = [(.,C/l(f)),-- - ,(.,C/L(f))], (11) 

is the approximate description of the state associated with 
all the other UEs, which refers to the fact that for those 
UEs we define a decision strategy that only refers to the cell 
occupancy, thereby avoiding the need to track instantaneous 
channel conditions for everyone. That being said, we can 
define the probability for any UE (say UE x) to select BS i 
starting from any state s*{t) as 

^id^{Srri{t))l ( 12 ) 

m—1 

^We note that a precise model would need to keep track of the channel 
conditions from all UEs to all BSs, which is clearly an impossible task. 















Figure 1: Average Spectral Efficiency (bits/s/Hz) for different schemes with 
3 BSs, handover cost = 10%. 

where is the total number of channel states’ combinations, 
and is calculated from the policy of the correspond¬ 

ing UE. 

{ 1, i = dx{Sfn{i')') 

(13) 

0, i ^ dx{Sfn{t')') 

Here, 7r(sm(t)) represents the steady-state distribution of 
the channel. Now, because we have introduced a probabilistic 
evolution of each UE, we can update the transition matrix 
accordingly. Then, based on the updated transition probability 
matrix (P), the Value Iteration Algorithm (VIA) described 
in Algorithm |2] is used to solve the MDP, which gives the 
optimal deterministic policy of the selected UE. This way, the 
load occupancy dynamics will be captured, thus allowing us to 
evaluate the overall performance of a fully characterized multi 
user system. In our VIA algorithm, v(s) denotes the maximum 
expected total reward, uj represents the discount factor, i.e., 
the length of the analyzed horizon, whereas r is the vector 
containing the reward values. Eor other UEs, the policy is 
retained from the previous iteration. The reason behind solving 
the MDP for one UE per iteration is to reach convergence. If 
multiple UEs change their policy simultaneously, we observed 
that the algorithm does not converge and instead oscillates 
indefinitely. This is a very well known property of multi¬ 
user distributed solutions, where the performance of a user is 
strongly coupled with the performance of the other users. Eor 
example, this issue is discussed in the context of distributed 
power control in a multi-cell 4G network m. In this paper, 
we do not provide a formal proof of the convergence of the 
proposed algorithm, which is left for future work. 

In summary, (i) we derive the optimal policy at each UE 
as a function of its detailed state (i.e., channel state plus 
occupancy), and find the related steady-state distribution; (ii) 
we converge to an optimal equilibrium through an iterative 
process by averaging the previous policies of all users over 
the conditional channel distribution at each iteration. 



Figure 2: Number of Handovers for all different schemes with 3 BSs, handover 
cost = 10%. 

IV. Simulation results 

In this section, we present some simulation results obtained 
by applying the proposed MDP-based handover algorithm 
for varying system configurations. Specifically, we studied 
simple scenarios where the number of BSs is fixed to 3, 
while the number of UEs varies between 3 and 6. Moreover, 
to better assess the performance of the proposed model, we 
generate the results for different handover costs, defined as the 
percentage of resources spent for signaling and flow rerouting, 
as defined in Section HI] The channel matrix shown in (fl4l) 
is obtained as per the description given in Section |II] where 
Tavg = ltLOS,iNLOS,iout] = [5,25,3], which characterizes 
an urban scenario where the dominant link is NLOS: 


0.55 

0.3 

0.15 

0.01 

0.8 

0.19 

0.38 

0.40 

0.22 


In addition, we consider a 28 GHz carrier frequency with 
1 GHz bandwidth, a slot duration equal to 125 p,s and 30 
OEDM symbols per slot (6 for control, 24 for data). 

We use the optimal policy obtained with our algorithm, 
and compare its performance against other cell selection 
approaches, namely; 

• Load: Each UE connects to the least loaded BS. If two or 
more BSs show the same occupancy level, UEs randomly 
select one of them; 

• Rate: UEs associate with the BS that can offer the best 
instantaneous rate, which depends on both channel and 
load information; 

• Channel: Traditional approach where UEs select the BS 
offering the best channel (SNR-based); 

• Upper Bound: Centralized exhaustive search method; 
it requires global information about link qualities along 
with cell occupancy, and exploits UE coordination, which 
is unavailable in distributed schemes. Hence, this ap¬ 
proach represents an upper bound. 











































Table I: Average Spectral Efficiency Gain (bits/s/Hz) 



3% OH 

6% OH 

10% OH 

30% OH 

3 UEs 

37% 

39% 

42% 

51% 

4 UEs 

33% 

35% 

39% 

50% 

5 UEs 

30% 

32% 

36% 

50% 

6 UEs 

26% 

28% 

33% 

50% 


In Figs. [U and in we report the results for the case of 
3 UEs with 3 BSs and handover cost 10%. We plot the 
average spectral efficiency (bits/s/Hz) and the average number 
of handovers. We can observe how the optimal policy obtained 
by solving the MDP described in Section |III] outperforms 
other approaches. In particular, we can note that the Load- 
based scheme, which relies solely on occupancy information, 
is very inefficient and results in biasing all the UEs towards 
unloaded cells. As a consequence, BSs will be overloaded, 
thus explaining the low rate observed in Eig. [T] On the other 
hand, Channel- and /?flfe-based schemes show better perfor¬ 
mance but, because of the channel variations that characterize 
mmWave links, instantaneous actions are highly inefficient. 

Instead our MDP model, where the dynamics of the links 
are fully captured, can be seen to provide significantly better 
performance. This not only results in increased sum-rate, but 
also provides a greatly reduced number of handovers, as shown 
in Pig. |2] thus representing a more energy-efficient solution. 
The Upper Bound refers to a centralized scheme, which 
compared to our distributed scheme has the advantage of full 
knowledge and of coordinated decisions, thereby resulting in 
significantly better performance in general. Nevertheless, it can 
be observed that despite such big advantages the performance 
gap between our solution and the centralized upper bound 
is not very wide, showing that our solution (which is not 
necessarily the distributed optimum because the problem is 
non-convex) still achieves a fairly good performance 

In Table |I] we report a more detailed sum-rate comparison 
of our MDP-based model against a traditional Channel-hased 
association scheme in terms of average spectral efficiency 
gain (%). We can observe significant gains in the MDP-based 
approach, which increase as the HO cost increases. The ability 
to capture optimal solutions for more complex scenarios may 
lead us to draw some important conclusions about the shape 
of effective policies, and to find heuristics to better address a 
number of critical issues related to mm Wave cellular networks. 



Figure 3: Number of states at varying number of BSs, L, and UEs, N. 


q{-), expressed as 


(/(a;) = maxja;, 0}. (15) 

Now, we can count the possible occupancy combinations 
for 1, 2,3,4 BSs as a function of the number of UEs N, i.e.. 


ci{N)=N + l, (16) 


C2(V) = 


N-l 


C3(V) ='^q 


i=0 


N -l-i 


(17) 

(18) 


N-l N-l 


C4{N) = ^ ^ 7 

^—0 j—0 


N-I-J 

2 





fe =0 


N-2-i-k 
2 



(19) 


and derive the number of load occupancy combinations at 
varying number of BSs, L, as follows. 


iV-l L-l 

Cl{N)=ci{N) + EE Ck{N-i). (20) 

i=0 k=2 

As stated above the number of possible channel states’ 
combinations is equal to K^. Therefore, the total number of 
states, as shown in Eig. |2 will be as follows; 

\SiL,K,N)\=K^CL{N). (21) 


V. Complexity analysis 

In this section, we aim at analyzing the complexity of our 
MDP model in terms of number of states required as a function 
of the number of UEs and BSs. 

Eirst of all, we will analytically derive the number of load 
occupancy combinations, for at most 5 BSsE Let us introduce 

^A reasonable maximum number of surrounding BSs. 


In terms of run time, the computational complexity (con¬ 
vergence time) of the proposed MDP approach increases 
exponentially with the number of UEs. However, the proposed 
algorithm can be executed offline within specific clusters of 
mmWave cells, at varying number of associated users. Each 
BS will disseminate the optimal policies for various numbers 
of instantaneous connected UEs, thus quickly adapting to 
topology changes, i.e., a new UE coming or leaving. 



































VI. Conclusions 


[14] H. Zhang, L. Venturino, N. Prasad, P. Li, S. Rangarajan, and X. Wang, 


In this paper, we have argued why harnessing the potential 
of mmWave cellular networks requires revisiting the problem 
of cell selection. Judicious choices in cell selection serve to 
improve the quality of service and increase network capacity, 
while minimizing the signaling overhead caused by sub- 
optimal cell selections and subsequent re-associations. We 
have made the case in favor of using MDPs to design and 
analyze association techniques. Through numerical analysis 
and simulations, we have demonstrated the ability of our 
proposed technique to achieve these goals. In the future, we 
plan to extend this work in several ways: i) evaluate over more 
complex networks to derive procedural guidelines to design 
heuristics; ii) investigate whether finer-grained SNR measure¬ 
ments can improve outcomes; in) examine the effectiveness 
of these techniques in heterogeneous networks. In conclusion, 
although mm Wave technology holds the promise to revolu¬ 
tionize cellular networks, realizing this potential will require 
revisiting and potentially redesigning several components of 
the communication stack. This paper makes an important step 
in this direction with focus on the problem of cell selection 
in mmWave cellular networks. 
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